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£N| . Abstract 

O . We prove that the size of the sparsest directed /c-spanner of a graph can be approximated 

in polynomial time to within a factor of 0{y/n), for all k > 3. This improves the 0(n 2 / 3 )- 
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approximation recently shown by Dinitz and Krauthgamer [DK10] 

1 Introduction 



A spanner of a graph generally denotes a sparse subgraph which preserves all pairwise distances 
up to a given approximation. More specifically, given a graph G = (V,E) and an integer k > 1, 
define the k-spanner of G = (V,E) to be a subgraph H = (V,Eh) such that, for any two vertices 
u,v £ V, 

distH(u,v) < k • distc(u,v) 

fSJ ' If each edge of graph G has an associated nonnegative length, then distc(u, v) denotes the smallest 

sum of the lengths of edges along a path from u to v. Spanners have numerous applications, such 
as efficient routing [CowOl, CW04, PU89b, RTZ02, TZ01], simulating synchronized protocols in 
unsynchronized networks [PU89a] , parallel, distributed and streaming algorithms for approximating 
shortest paths [Coh98, CohOO, ElkOl, FKM+08], and algorithms for distance oracles [BS06, TZ05]. 

For integer k > 1, the computational problem Directed /c-Spanner is the task of finding the 
minimum number of edges in a fc-spanner of an input directed graph G. Peleg and Schaffer [PS89] 
and Cai [Cai93] show that Directed ^-Spanner is NP-hard for every k > 2. The approximability 
of Directed /c-Spanner has also been well-studied. When k = 2, Kortsarz and Peleg [KP94], 
and Elkin and Peleg [EP01], showed a tight O(logn) approximation. For k = 3 and k = 4, 
Berman, Raskhodnikova and Ruan [BRR10] designed an 0(y / n)-approximation algorithm. Dinitz 
and Krauthgamer [DK10] independently showed 0{^Jn) approximability for k = 3 and gave an 
0(n 2 / 3 )-approximation algorithm valid for all k > 4, thus obtaining for the first time for this 
problem an approximation ratio that does not degrade with increasing k. It is known [EP07] that 
for every e, 6 € (0, 1), Directed A;-Spanner with 3 < k = o(n 1-5 ) is inapproximable in polynomial 
time to within a factor of 2 lo ^ n , unless NP C DTIME(n pol y lo s n ). 
Our main result is the following. 

Theorem 1 For any integer k > 3, there is a polynomial time algorithm that, in expectation, finds 
an 0(y/n)- approximation for the Directed /c-Spanner problem. 
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This matches the Oyfn) bound conjectured by Dinitz and Krauthgamer. Our algorithm, similar to 
that of [DK10] and the earlier [BGJ + 09], operates by letting the edges of the directed fc-spanner 
be the union of two subsets of edges of the original graph, the first obtained by rounding the 
solution to a linear programming (LP) relaxation of the problem and the second obtained by edges 
of shortest-path trees growing from randomly selected vertices. Our main technical innovation is 
in designing and analyzing a new rounding algorithm that selects each edge of the original graph 
independently with probability proportional to its LP value. 

2 Proof of Main Result 
2.1 The LP Relaxation 

The LP relaxation that we use for the Directed &;-Spanner problem is exactly the same as the 
flow-based LP introduced by Dinitz and Krauthgamer [DK10]. Let us reproduce it here. Suppose 
we are given a directed graph G = (V, E) with a length function £ : E — > For every edge 

(u, v ) G E, define V u ,v to be the set of all directed paths p in G from uto v such that the length of p 
is at most k times the length of the shortest path in G from u to v. Thus, for every edge (u, v) G E, 
the A;-spanner of G must contain at least one path from V u ,v The LP will have variables x e for 
each edge e G E, representing whether or not e is included in the spanner, and variables f p for each 
path P in \J, N gE V u ,vi representing the flow along the path P. The LP is then as follows: 
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Notice that the number of variables can be exponential for large k, and hence, a priori, it might 
not be clear that the LP can be solved optimally in polynomial time. However, by a separation 
oracle argument, one can find an approximate solution in polynomial time for any k. 

Theorem 2 (Theorem 2.1 in Dinitz and Krauthgamer [DK10]) There is a polynomial time 
(1 + e)- approximation algorithm for the above LP for any constant e > 0. 

2.2 Constructing the A;-spanner 

Given a fractional solution (x, f) to the above LP, we now describe how to construct a /c-spanner 
H = (V, Eh) for the input graph G = (V, E). The edge set Eh C E is constructed via the following 
simple randomized algorithm (with a parameter a): 
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1. Let E H = 0. 



2. For each edge e £ E independently, add e to Eh with probability min(ax e y / n, 1). 

3. Let S be a set of vertices, each chosen independently at random from V with proba- 
bility min(a/- v /n, 1). For each v £ S, add the edges of the outward shortest-path tree 
and the inward shortest-path tree rooted at v to Eh- 

We first give a proof for a special case when all edges have unit lengths (i.e., i{e) = 1 for all 
e £ E) and k is upper-bounded by a constant. Then, we give an alternative proof that works for 
general edge lengths and arbitrary k. 

2.3 Unit length edges 

Theorem 3 For graphs with unit edge lengths, the approximation ratio of the randomized algorithm 
described above (with a = wVklogn) is 30Vnklogn. 



Proof We first show that the expected cost of the solution is at most 30v nk log n times the 
optimal cost. Then, we prove that the obtained solution is feasible. 

The expected number of edges sampled at the second step is lOVnk log n LP, where LP is the 
cost of the LP. The cost of every shortest-path tree is at most the cost of the optimal solution, since 
the optimal solution must contain at least (n — 1) edges (here we assume that G is connected, since 
we can handle different connected components separately). Thus, the expected number of edges 
added at the third step is at most 20Vnk log n OPT. The total expected cost of the solution is at 
most SOVnklogn OPT. 

We now prove that the solution returned by the algorithm is feasible with probability 1 — n~ 3 . 
It suffices to show that with probability at least 1 — n -3 , for every adjacent vertices u and v 
((u,v) £ E) there exists a path in H of length at most k connecting u and v. Fix an arbitrary 
(u,v) £ E. 

Let as before V u ,v De the set of all paths of length at most k going from u to v, and let 

V u ,v = [J P 



be the set of vertices covered by these paths. We consider two cases. If \V u ,v\ > yfn/k, then the set 
S sampled at the third step of the algorithm contains at least one vertex from V UjV with probability 

1 _ (i _ WHogny^l >1 _( 1 _ lOlognx V^A > 1 _ e „ 101ogn = 1 _ n _ 10 
V J - \ ^Jk ) 

If S n V U;V 7^ 0, we pick an s £ S D V u>v . Since, s £ V U;V , distc(u, s) + dista(s, v) < k. Hence, the 
union of the inward and outward shortest-path trees rooted at s (that the algorithm adds to Eh 
at the third step) contains a path going from u to v of length at most k. 

We now consider the case |V Ui „| < yjn/k. Perform a mental experiment. Make k copies of each 
vertex w £ V u ,v \ {u}: (w, 1), (w, 2), . . . , (w, k); and make a copy of vertex u: (u, 0). For every path 
V £ T^u^v define a new path p from u to v as 

Pi = (Pi,i) 
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where pi and pi are the i-th vertices of p and p respectively. We assign flow f p to p. The total 
flow going from (u, 0) to (v, k') (for k! < k) in the new graph is at least 1. We now start removing 
"light" vertices from the new graph. Initially, let V = V U)V x {1, 2, . . . , k} U {(it, 0)} (i.e., V is the 
set of all "copies" (w,i)) and V = {p : p G V u , v }- Note, that |V| < |V| x k < \fnk~. At every step 
we find a vertex (w,i) E V,w/d the flow through which is less than l/V^nk (we call this vertex 
a "light" vertex) i.e., a vertex (w,i) £ V, to / d such that 

pGViwGp 

We then remove (w,i) from V and all paths p 3 (w,i) from "P. We stop when there are no more 
such vertices (w, i) left. Note, that after removing each vertex w and all paths going through it, 
we recompute the flow going through the remaining vertices. 

We remove at most \pnk~ vertices from V (simply because before removing "light" vertices the 
size of V was at most Vnk), and at most \fnkj\Jlnk = s/2/2 units of flow. Thus, the remaining 
weight of paths p G V is at least 1 — \/2/2 > 1/4; and thus (u,0) G V. 

Now, pick an arbitrary vertex (w, i) G V. Observe, that if w ^ v, 

(w,w')£E: (-iii',j+l)eV p: Pi=w, Pi+i&V, peV u ,v p£V:w£p 

Hence, the set Eh contains at least one edge (w,w') such that (w',i + 1) G V with probability 

1- Yl (l-10x {W:Vjl) V^klogn) > 1- ]~f e -10x (wV) >/Sfclogn 

(w',«+i)ev («;,«)'): (w,j+i)ev 

> I - e- 5Vllosn > I - n- 7 , 

if for all (tt/,i + l) G V, xt w w n < (lO-v/n^logn) -1 ; and with probability 1, otherwise. By the union 

bound, with probability at least 1 — n~ 5 for all (w,i) G V there is such (w,w') in Eh- Thus, there 
exists a path w± = u, u>2, • • • , W)~' = v of length at most A; such that (wi,i) G F and (wi, G 
for every i. Hence, u and v are connected with a path of length at most k with probability at least 
1 -n" 5 . 

We showed that for every (u, v) G E, with probability 1— n -5 , there exists a path in H = (V, 
of length at most k connecting u and v; and therefore, with probability at least 1 — l/n~ 3 , Eh is 
a fc-spanner. ■ 

2.4 General case 

Theorem 4 T/te approximation ratio of the randomized algorithm described above (with a = 
5 logn,) is l^yfnlogn. 

Remark The first several steps in the proof are the same as in the proof of Theorem 3. However, 
the proof of the main case (|V U;t) | < \/n) is very different from the previous proof. 

Proof The expected number of edges sampled at the second step is Sy^logn LP, where LP is 
the cost of the LP. The cost of every shortest-path tree is at most the cost of the optimal solution, 
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since the optimal solution must contain at least (n — 1) edges (here we assume that G is connected, 
since we can handle different connected components separately). In expectation, we add the edges 
of at most 2ay / n = lOy^logn shortest-path trees, and so, the expected number of edges added 
at the third step is at most lOy^logn OPT. The total expected cost of the solution is at most 
15 Vn log n OPT. 

We now prove that the solution returned by the algorithm is feasible with probability at least 
1 — n~ 3 . Consider two arbitrary adjacent vertices u and v {{u,v) G E). We show that with 
probability 1 — n -5 , there exists a path in H of length at most k£(u,v) connecting u and v. 

As in Theorem 3 we consider two cases: the set V u ,v = U p eP u v P * s l ar S e (l^u^l > V™\) an d 
small (|V Uil) | < \fn\). For the case \V U)V \ > \fn, we use exactly the same proof as before to show 
that u and v are connected with a short path which is contained in the union of two shortest-path 
trees (see Theorem 3). 

So, consider the second case, |V U) „| < -y/re. Let G' = G[V Ui „] be the graph induced on the vertex 
set V u ,v We denote the set of edges of G' by E' = {(^i, u>2) £ G : uii,W2 £ Vu,v}, H' = H n G' and 
E' H = Eh n E' . We show that the set Eh satisfies the dual LP constraints with high probability. 

Consider an arborescence T C G' rooted at u. Let dr be the shortest path metric on T. Define 
a function Lt ■ V UjV — > M + , 

Lt(w) = dr(u, w), 

and let 

St = {(wi,W2) G E 1 : L T {w2) > L T (w{) + £(wi,w 2 )}. 
In Section 2.5, we prove the following claims. 

Claim 5 A subgraph H' C G' contains a directed path of length at most K connecting u and v, if 
and only if for every arborescence T C G' rooted at u with dx{u,v) > K, 

E'h n St 0. 

Remark This claim is an analog of the special case of min-cut/max-flow theorem: A subgraph 
H' contains a directed path connecting u and v, if and only if every cut separating u and v in G' 
contains an edge from E' H . 

Claim 6 For every arborescence T C G' rooted at u with dT(u,v) > k • £(u,v), 

^2 X e > 1, 

eeS T 

where {x e } is the LP solution. 

We show that for every arborescence TcG" rooted at u with dj-(n, f) > k ■ £(u,v), 

E'h n S T / 

with high probability; and, thus, by Claim 5 there exists a path in Ejj of length at most k ■ £(u, v) 
connecting u and v. Let X e be the indicator random variable defined as follows: X e = 1 if e 6 Eh] 
X e = otherwise. If for some e G St, x e > l/(a.^/n), then X e = 1 with probability 1. Otherwise, 
for every e £ St, Pr(A e = 1) = a^fnx e . 
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Estimate the probability that for all e 6 St, Xi = 0, 

Pr(X e = for all e G 5 T ) = (1 - Pr(X e = 1)) < \\ e~ p < x * =1 ) 

eeS T e£S T 
= e - E ee s T Pr(^e=l) = e - E e6 s T "^v'S 

By Claim 6, 

e6S T 

thus, Pr(X e = for all e G 5 T ) < e~ Q v^. Hence, Pr(U e6 s T {X e = 1}) > 1 - e^ Qv/ ". In other words, 
for a fixed arborescence T, 

Pr(5 T n > 1 - e~ a ^. 
The total number of arborescences can be bounded by 

since for every vertex w 6 V Ui „ there at most \V UjV \ possible ways to choose a parent node w' G V U)t) . 
Hence, by the union bound with probability at least 

I _ v^logn > 1 — g-v^og™ 

there exists a path of length at most k • £(u, v) between u and v in H. ■ 
2.5 Proofs of Claim 5 and Claim 6 

Claim 5 A subgraph H' C G' contains a directed path of length at most K connecting u and v, if 
and only if for every arborescence T C G' rooted at u with dx{u,v) > K, 

E' H ns T ^ 0. 

Proof 

I. Suppose that there is a path p of length at most K connecting u and v in H' . Consider an 
arbitrary arborescence T C G' rooted at u with dr(u,v) > K. Write, 

If l—i 

length(p) = ^ Z(Pi:Pi+i) < K - 
i=i 

Then, 

bl-i 

^2 (LriPi+i) - L T (pi)) = L T (p\ p \) - L T (pi) = L T (v) - L T (u) > K. 

i=l 

Hence, for some i, 

e(pi,p i+1 ) < L T (p i+1 ) - L T (pi), 

and therefore, {pi,Pi + \) £ St (by the definition of St)- Since, (pi,pi + ±) £ E' H , StC\E' h / 0. 
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II. Now, assume that for every arborescence T C G' rooted at u with dr(u, v) > K, 

E' H ns T ^ 0. 

Let T be the directed shortest path tree in E' H rooted at u. We claim that St H E' h = and 
thus cIt(u,v) < K. Indeed, for all (^1,1^2) £ E' H , 

L T (w 2 ) = d T (u,w 2 ) < d T (u,wi) +£(wi,w 2 ) = L T {w{) + £(w 1 ,w 2 ), 

here we have used that T is the shortest path tree and thus dT{u,w 2 ) < dx{u, wi) + £(w\,w 2 ). 
Therefore, (w\,w 2 ) ^ St- 



Claim 6 For every arborescence T C G' rooted at u with dT(u,v) > k • £(u,v), 

^ x e >l, 

e£S T 

where {x e } is the LP solution. 

Proof Since {x e } is part of an LP solution: 

> E h 

P<^Pu,v 

> 1 

The first and third inequalities are from the definition of the LP. The second inequality follows 
because, by Claim 5, every path p £ V u ,v contains at least one edge in St and because each f p is 
nonnegative. ■ 

3 Conclusion 

We proved above an 0(y / n)-approximation for Directed /c-Spanner. This settles the conjecture 
of Dinitz and Krauthgamer [DK10]. Note that Elkin and Peleg [EP07] have shown that the approx- 
imation ratio for this problem cannnot be expected to be Oin 1 ^) or even in contrast 
to the situation for undirected graphs. 

Our algorithm obviously applies to special cases of the Directed /c-Spanner problem, such 
as the A;-Transitive Closure Spanner problem [BGJ+09]. It also straightforwardly extends to 
the Client-Server /c-Spanner problem and the /c-Diameter Spanning Subgraph problem. 
See [EP05] for definitions and motivation; we omit details here. Finally, consider the General 
Directed /c-Spanner problem, defined in [EP07]. Here, each edge of the input directed graph has 
a nonnegative weight as well as a length, and the objective is to minimize the sum of the weights 
of the edges in the /c-spanner. We note that our 0(y/n) approximation ratio still applies to this 
problem, as long as the weight of each edge is lower-bounded by a positive constant. 
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